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Abstract 

We provide a simple example that illustrates the advantage of adaptive over non-adaptive 
strategies for quantum channel discrimination. In particular, we give a pair of entanglement- 
breaking channels that can be perfectly discriminated by means of an adaptive strategy that re- 
quires just two channel evaluations, but for which no non-adaptive strategy can give a perfect 
discrimination using any finite number of channel evaluations. 

1 Introduction 

This paper concerns the problem of quantum channel discrimination. In this problem, two quantum 
channels Oo and Oi are fixed, and access to one of the two channels is made available. It is not 
known which of the two channels has been made available, however, and the goal is to correctly 
identify which of <J> and Oi it is. Several papers, including HAci0lHACTJ98llCPR00l ICDP081 IDPPOli 
IDFY091 |Hay08l lKit971 IPW091 ISac05bl ISac05al IWY061 IWat08ll , have discovered many interesting as- 



pects of quantum channel discrimination. There exist related topics in the study of quantum infor- 



mation theory, including quantum parameter estimation (see, for instance HFI03LHH091 JWD + 08 1 and the 
references therein), but this paper will focus just on the specific problem of channel discrimination. 

A discrimination strategy for a quantum channel discrimination problem is a step-by-step pro- 
cedure consisting of channel evaluations, along with quantum state preparations, operations, and 
measurements, that attempts to output the identity of the given channel. Generally speaking, one 
is typically interested in discrimination strategies that satisfy certain natural constraints; with one 
well-studied example being the discrimination strategies allowing a single evaluation of the unknown 
channel. An optimal discrimination strategy, among those satisfying a given collection of constraints, 
is simply one that maximizes the probability that the unknown channel is correctly identified, as- 
suming it is selected according to a fixed distribution that is known ahead of time. 

One interesting aspect of quantum channel discrimination is that the use of an ancillary system is 
generally necessary for the optimal discrimination of two quantum channels, assuming just a single 
evaluation of the unknown channel is made available HKit97l I AKN98 1 IDPPOI I IKSV02I . In more pre- 
cise terms, the optimal strategy to discriminate two channels may require that one first prepares the 
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input system to the unknown channel in an entangled state with an ancillary system, followed by a 
joint measurement of that channel's output together with the ancillary system. Even entanglement- 
breaking channels are sometimes better discriminated through the use of an ancillary system, despite 
the fact that their output systems must necessarily be unentangled with the ancillary system after 
their evaluation MSac05al . There are two known special classes of channels that require no ancillary 
system for optimal discrimination: the unitary channels RAKN981 ICPR00| and the classical channels. 

There is a striking possibility for quantum channel discrimination problems that cannot occur 
in the classical setting. If a pair of classical channels cannot be perfectly distinguished with one 
evaluation, then they cannot be perfectly distinguished with any finite number of evaluations. (This 
fact is easily proved, and a simple proof may be found later in the paper.) In contrast, it is possible for 
a pair of quantum channels to be discriminated perfectly when multiple evaluations are available, but 
not in the single evaluation case. For example, this generally happens in the case of unitary channels 
lAcIuTj . 

Another interesting aspect of quantum channel discrimination is the distinction between adaptive 
and non-adaptive strategies when multiple uses of the unknown channel are made available. In an 
adaptive strategy, one may use the outputs of previous uses of the channel when preparing the input 
to subsequent uses, whereas a non-adaptive strategy requires that the inputs to all uses of the given 
channel are chosen before any of them is evaluated. It was found in IICDP08H that unitary channels 
are insensitive to this distinction; adaptive strategies do not give any advantage over non-adaptive 
strategies for unitary channel discrimination. In the same paper, a pair of memory channels was shown 
to require an adaptive scheme for optimal discrimination, but the question of whether or not there 
exist ordinary (non-memory) channels with a similar property was stated as an open question. Al- 
though an example of three channels that require adaptive strategies for an optimal identification was 
presented in [WY06J, we were not able to find any example of a pair of (ordinary, non-memory) chan- 
nels in the literature that require adaptive strategies for optimal discrimination; and so the question 
appears to have been unresolved prior to this work. 

The purpose of the present paper is to demonstrate the necessity of adaptive schemes for optimal 
quantum channel discrimination. We do this by presenting an example of two quantum channels 
that can be perfectly discriminated given two adaptive channel evaluations, but for which no finite 
number of non-adaptive channel evaluations allows for a perfect discrimination. The channels in 
our example are entanglement-breaking channels, which provides further evidence suggesting that 
entanglement-breaking channels share similar properties to general quantum channels with respect 
to channel discrimination tasks. We note that a recent paper of Duan, Feng, and Ying (DFY09J has 
provided a criterion for the perfect discrimination of pairs of quantum channels, as well as a general 
method to find adaptive strategies that allow for perfect discrimination. While no explicit examples 
were given in that paper, the existence of pairs of channels with similar properties to those in our ex- 
ample is implied. Our example was, however, obtained independently from that paper, and we hope 
that it offers some insight into the problem of quantum channel discrimination that is complementary 
to HDFY09H . 

Finally, we note that a related (but weaker) phenomenon occurs in the context of classical channel 
discrimination. That is, there exist classical channels that can be better discriminated by adaptive 
strategies than by non-adaptive strategies, and we provide three simple examples illustrating this 
phenomenon. While we suspect that similar examples illustrating the advantages of adaptive dis- 
crimination strategies may be known to some researchers, we did not find any in the literature. That 
such examples exist is also interesting when contrasted with the fact that adaptive strategies for 
classical channel discrimination cannot improve the asymptotic rate at which the error probability 
exponentially decays with the number of channel uses | Hay 08 [. 
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2 Preliminaries 



We will begin by summarizing some of the notation and terminology that is used in the subsequent 
sections of the paper. We will let X, y and W denote finite-dimensional complex Hilbert spaces, 
which will typically correspond to the input, output, and ancillary systems to be associated with 
channel discrimination tasks. The notation L {X, y) refers to the space of all linear operators from 
X to y, L (X) is shorthand for L(X,X), and T)(X) refers to the set of all density operators on 
X . A similar notation is used for other spaces in place of X and y. The identity operator on X is 
denoted 1%. 

For the example to be presented in the main part of the paper, we will let X and y be the spaces 
associated with two qubits and one qubit, respectively. The standard bases for these spaces are there- 
fore {|00) , |01) , 1 10) , [11)} and {|0) , |1)}. As is common, we will also write 

| +> .-L|0) + -L| 1 > and |-> = -l|0>--L|l), 

and we write tensor products of these states and standard basis states in a self-explanatory way (e.g., 

|l+> = |1> 1+))- 

A quantum channel is a linear mapping of the form <£> : L (X) — » L (y) that is both completely 
positive and trace-preserving. Every such quantum channel O can be expressed in Kraus form as 

m 

<d(x) = £a,xa; 

7=1 

for some choice of linear operators A\,...,A m G L (X, y) satisfying the constraint 

m 

The identity channel mapping L (W) to itself is denoted 1l(w)- 

The distinguishability of two quantum channels Oo,Oi : L (X) — ► L (y) may be quantified by 
the distance induced by the diamond norm (or completely bounded trace norm) 



I Oo — Oi II = max 



(<t>o®\ m )(p)-{®i®\(W))(.p) , (1) 



where here W is assumed to have dimension at least that of X. This quantity represents the greatest 
possible degree of distinguishability that can result by feeding an input state into the two channels, 
allowing for the possibility that the input system is entangled with an ancillary system. Assuming 
that a bit a G {0, 1} is uniformly chosen at random, the quantity 

- + - ||*o -Oi Ik 

represents the optimal probability to correctly determine the value of a by means of a physical process 
involving just a single evaluation of the channel O fl . It therefore holds that Oo an d *i are perfectly 
distinguishable using a single evaluation if and only if || — *i ||<> = 2. 
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3 Specification of the example and a perfect discrimination protocol 



We now describe our example of two quantum channels that are better discriminated using an adap- 
tive strategy than by any non-adaptive strategy. First, we will give an intuitive description of the 
channels. The two channels, Oo and <J>i, both map two qubits to one and operate as follows. 

• Channel Oo measures the first input qubit with respect to the standard basis. If the result is 0, it 
outputs the state |0). If the result is 1, it measures the second qubit with respect to the standard 
basis. If the result is 0, then it outputs 0, and if the result is 1, then it outputs the completely mixed 
state 1/2. 

• Channel <&i measures the first input qubit with respect to the standard basis. If the result is 0, it 
outputs the state |+). If the result is 1, it measures the second qubit with respect to the {|+) , | — )} 
basis. If the result is +, then it outputs 1, and if the result is — , then it outputs the completely 
mixed state 1/2. 

The intuition behind these channels is as follows. If the first input qubit is set to 0, then the output 
is a "key" state: |0) for channel <&o and |+) for the channel Oi. If the first input is set to 1, and the 
second input qubit is the channel's "key" state, then the channel identifies itself (i.e., <I>o outputs and 
Oi outputs 1). If, however, the first input qubit is set to 1 and the second qubit's state is orthogonal 
to the channel's "key" state, then the channel outputs the completely mixed state. This effectively 
means that the channel provides no information about its identity in this case. 

It is easy to discriminate these two channels with an adaptive strategy that requires two uses of 
the unknown channel. The following diagram describes such a strategy: 




Here, the state p input as the second qubit of the first channel evaluation is arbitrary, as it is effectively 
discarded by both of the channels when the first input qubit is set to |0) . 

In the interest of precision, and because it will be useful for the analysis of the next section, we 
note the following formal specifications of these channels. It holds that 

5 5 

d> (X) = £ AjXAj and <D a (X) = £ BjXB* 

;=1 ;=1 

for 

A 1 = |0)(00|, A 2 =|0)(01|, A 3 = |0){10|, A 4 = -^=|0)(n|, A 5 = -J= |1) (11| , 

B 1 = |+)(00|, B 2 = |+)(01|, 83 = 11X1+1, B 4 = -^=|0)(1-|, B 5 = -J=|l) <1-|. 

It is clear that Oo and <3>i are both entanglement-breaking channels, as all of these Kraus operators 
have rank one IHSR03I . 
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4 Sub-optimality of non-adaptive strategies 



We now prove that non-adaptive strategies cannot allow for a perfect discrimination of the channels 
<&0 and <E>i defined in the previous section, for any finite number n of channel uses. In more precise 
terms, we have 



Of" II <2 

1 1 1 o 



for all choices of n G N. 

We first prove a simpler mathematical fact, which is that there does not exist a two-qubit density 
operator p for which <t>o(p) an d $?l(p) are perfectly distinguishable. As we will see, the proof is 
similar when taking the tensor product of the channel with itself or with an identity channel that 
acts on an auxiliary system. This handles the multiple-copy case with the possible use of an ancillary 
space, thus establishing the more general statement above. 

Assume toward contradiction that there exists a density operator p such that <&o (p) and Oi (p) are 
perfectly distinguishable. By a simple convexity argument, we may assume that the same is true for 
a pure state \ip)(ip\ in place of p. In other words, there exists a unit vector \ip) satisfying 



Tr(0 1 (|^>^|)Oo(|^><^|))=0. 
Expanding this equation in terms of the Kraus operators of Oq and <&i yields 



(2) 



5 5 
;=1 k=l 



B*A k \ip) 



0. 



Each of the terms in this sum is nonnegative, and must therefore be zero, i.e., 
choices of /, k G { 1, . . . , 5} . It follows that 



BjA k 



| £ £ Hk B*A k 

7=1 jfc=i 







for all 



(3) 



for every choice of complex numbers {oij /k : 1 < j, k < 5}. 

We will now obtain a contradiction by choosing the coefficients {oij ik : 1 < j,k < 5} in such a 
way that 10 cannot hold. In particular, by letting 

a 14 = ^2,2 = v2, #3,5 = #4,3 = 1, and = —2\/l, 
and letting «; i = for all of the remaining values of j and k, we find that 

7=1 fc=l 



for 





n 








° ^ 







1 
















1/2 


-1/2 









-1/2 


3/2 / 



P = |00)(00| + |01)(01| + |ii)(n| + |i-)(i 



The operator P is positive definite and therefore (ip\P\ip) > for every nonzero vector \ip), which is 
in contradiction with <(3j. Having established a contradiction, we conclude that there cannot exist a 
density operator p such that <&o(p) and ®i(p) are perfectly distinguishable as claimed. 
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Now let us consider the general setting where an arbitrary finite number n of (non-adaptive) 
channel uses, as well as an ancillary system of arbitrary size, are permitted. We may follow a similar 
proof to the one above to show that there cannot exist a unit vector \tp) such that 



Tr 



(of"®l L(vv) )(|^)(^|)(<3> ^®l L(vv) )(|^^|)] =0, (4) 



where W is the space (of arbitrary finite dimension) that is to be associated with the ancillary system. 
We may express the relevant mappings in this expression in terms of the Kraus operators of Oo and 
<&i as follows: 

(<"®1 L(W ))(X)= £ (A h ®---®A jn ®l w )X(A h ®---®A jn ®l w )*, 

l</i,..,/„<5 

(<Df"®l L(vv) )(X)= £ (B h ®-"®B jn ®l w )X(B h ®--'®B ja ®l w y. 
i<A,..,/„<5 

Now, for the same coefficients {ocj^ : 1 < j,k < 5} that were defined above, we find that 

£ ot hM ■ ■ ■ a hikn B* h A h ® • • • ® B* kn A jn ® l w = P®" ® l w , 

1<;i,..v7«<5 
l<fci,...,fc„<5 

which is again positive definite. Therefore, there cannot exist a nonzero vector \ip) for which 

(1[>\ B* h A kl ® • • • ® B| n A /n ® lw |V) = 

for all ;*!, . . . ,/„, k\,..., k n . Consequently, © does not hold for any nonzero vector \tp), which implies 
that Oo and <3>i cannot be perfectly discriminated by means of a non-adaptive strategy. 

When the number of evaluations n of the unknown channel is small, one can efficiently compute 
the value H^f 1 " — ^f^lL because it is the optimal value of a semidefinite programming problem 
HWat09H . For instance, it holds that 

ll*o-«&iL = i + -^, 

and therefore the channels can be discriminated with a probability 

I + i||$ -O 1 || o « 0.9268 

of correctness with just a single channel evaluation. For two non-adaptive queries, we used CVX 
BGB091 [GB08|I , a package for specifying and solving convex programs in Matlab, to approximate the 
value 

1 1 ,, 

2 + 4 H^oO^o -<S>i ®<>i|L ~ 0.9771. 

One can also obtain an upper bound on the probability of success using any feasible solution to the 
dual problem. In fact, even obvious choices give fairly tight upper bounds. Thus, we establish a 
small, but finite, advantage of an adaptive strategy over a non-adaptive one for discriminating these 
channels. 
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5 Remarks on classical channel discrimination 



The channels in our example above are entanglement-breaking channels, yet the optimal adaptive 
discriminating strategy operates in a distinctively quantum way: one out of two nonorthogonal key 
states is extracted from the first channel evaluation and coherently input to the second. A natural 
question arises, which is whether adaptive strategies also help when discriminating classical channels. 
It turns out that adaptive strategies indeed are better in the classical setting, although in a more 
limited respect. This section discusses a few basic facts and examples that illustrate this claim. 

A classical channel can, of course, be succinctly represented by a stochastic matrix M, where the 
vector M \k) represents the output distribution when the input is k. Throughout this section, we will 
let Mq and Mi denote the two possible channels in a classical channel discrimination problem. 



Advantages of adaptive classical strategies 

We will present three examples illustrating that adaptive strategies may give advantages over non- 
adaptive strategies for classical channel discrimination, restricting our attention to the special case 
where just two channel evaluations are permitted, and where one of two channels is given with equal 
probability. We have the following expression for the optimal success probability using an adaptive 
strategy in this setting: 

\ + i ™fE ll M o(M) Mo |/(;)) - M 1 (j,k) M 1 |/(;)) \\, . (5) 

In this expression, / and k range over all outputs and inputs, respectively, of the channels Mo and Mi 
(i.e., they are row and column indices). The function / ranges over all maps from outputs to inputs 
(or row indices to column indices). 

An alternate expression for the optimal success probability 10 is 

1 1 

2 + 4 m f x E^O'/ k ) m f x II Po(j, k) M \l) - pi (J, k) Mi |/) \\ x , 



where 



and where 



q y f Q = M (i,k)+M 1 (j / k) 



M (j,k)+M 1 (j,k) 

is the probability that the unknown channel is M fl , conditioned on k being chosen as the input and 
/ being obtained as the output. This illustrates that, at least for strategies allowing just two channel 
evaluations, that the optimal adaptive strategy for two uses of a classical channel can be readily 
found, by first finding the optimal input for each prior distribution over the chosen channel (this 
may be the input in the second use). We then compute the success probability given every prior 
distribution and one use of the channel. Finally, to choose an input to the first use of the channel, we 
choose an input which maximizes the probability of getting each prior times the success probability 
given that prior. 

Example 1. This "minimal" example shows that adaptive strategies are better than nonadaptive ones 
in some cases. The two channels are given by: 

M 



A/3 


8/9\ 


(0 l/3\ 






Mi = 


U/3 


1/9 ' 


U 2/3/ 
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One can verify that the best two-evaluation non-adaptive strategy is to input 1 to both of the channel 
uses, which leads to a correct identification with probability 7/ 9. The best adaptive strategy is to take 
k = 2 and f(l) = 2, f(2) = 1 in the formula 10, which gives a correct identification with probability 
65/81. Similar examples are abundant. 

Example 2. Here, the optimal 1-shot input is never used in the optimal non-adaptive scheme. The 
idea is to start with two optimal 1-shot inputs k, k' such that using k' becomes more informative with 
2 parallel uses. Then we perturb the A:-th column slightly so that k becomes the unique optimal 1-shot 
input. In this example, the optimal 1-shot input k still serves as the first input to the optimal adaptive 
scheme. 

Let the two channels be given by: 





/0.86 


0.45 


1 


0.5^ 




/0.15 


0.1 


0.5 


o\ 




0.14 


0.1 





0.5 


Mi = 


0.85 


0.8 


0.5 


1 




{ 


0.45 









I 


0.1 





°) 



The best one-shot input is k = 1 (probability of success is 0.855) (whereas k' = 2). The best parallel 
input pairs are (2,3) and (3,2) (probability of success is 0.9). Allowing adaptation, and using k — 1 
as the first input, /(l) = 3,/(2) = 4,/(3) = 1, the probability of success is 0.9275. 

Example 3. In this final example, the optimal 1-shot input is not the first input to the optimal adaptive 
scheme. The idea is to have two optimal 1-shot inputs in which one is more informative than the other 
if given a second use. Then, we perturb the column corresponding to the less informative input to be 
slightly better for the 1-shot case. 
Let the two channels be given by: 







0.5 


0.828 


0.76^ 




/0.5 





0.092 


0.04\ 


M = 





0.5 


0.092 


0.04 




0.5 


1 


0.828 


0.76 




1° 





0.08 


0.2 j 




1° 





0.08 


0.2 j 



The best one-shot input is 3 (probability of success is 0.868) but the best parallel input pairs to two 
uses are (3, 4) and (4, 3) (probability of success is 0.9336). The optimal adaptive scheme uses k = 4 as 
the first input, and f(j) = j for = 1, 2, 3, resulting in a probability of success of 0.9536. 

Perfect classical strategies 

Finally, we give a simple proof of a fact claimed in the introduction of this paper, which is that if 
two classical channels are not perfectly distinguishable with a single evaluation, then they cannot be 
perfectly distinguished by any finite number of evaluations, even using an adaptive strategy. We will 
prove the contrapositive of this statement. 

Suppose that two classical channels Mo and Mi are perfectly discriminated by a discrimination 
strategy that uses n channel evaluations. Without loss of generality we may assume the strategy takes 
the general form suggested in Figured] The assumption that the strategy perfectly discriminates Mo 
and Mi means that the final output distributions for the cases a = and a = 1 have disjoint support. 
Our goal is to prove that Mo and Mi are perfectly discriminated with a single evaluation. 

The proof of this statement proceeds by induction on n. In case n = 1 there is nothing to prove, 
so assume that n > 2. Consider the two distributions qo and q\ that are illustrated in the figure. 
Each distribution q„ represents the state of the discrimination strategy immediately before the final 
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Figure 1: The structure of a general discrimination strategy for classical channels. This example 
makes four channel evaluations, each illustrated by a box labeled M a , but in general any finite num- 
ber n of evaluations may be considered. Each arrow represents a register that may be in a random 
mixture over some finite set of classical states, and the boxes labeled Fq, F\, F2 and F3 represent arbi- 
trary functions (or random processes) that must be independent of the value a G {0, 1} that indicates 
which of the two channels is given. 

channel evaluation takes place, assuming the unknown channel is given by M a . There are two cases: 
qo and q\ have disjoint support, or they do not. If qo and q\ do have disjoint support, then terminating 
the discrimination strategy after n — 1 channel evaluations allows for a perfect discrimination, so by 
the induction hypothesis it is possible to discriminate the channels with a single evaluation. In the 
other case, where qo and qi do not have disjoint supports, there must exist a classical state x of the 
strategy at the time under consideration for which qo{%) an d qi { x ) are both positive. Given that the 
discrimination strategy is perfect, and therefore has final distributions with disjoint supports, it must 
hold that evaluating Mo and M\ on x results in distributions with disjoint supports. Therefore, Mo 
and Mi can be discriminated with a single channel evaluation as required. 

6 Conclusion 

In this paper, we presented a pair of quantum channels that can be discriminated perfectly by a strat- 
egy making two adaptive channel evaluations, but which cannot be perfectly discriminated non- 
adaptively with any finite number of channel evaluations. 

One natural question that arises is whether our example can be generalized to show a similar 
advantage of general adaptive strategies making n channel evaluations versus strategies that make 
channel evaluations with depth at most n — 1. Although our example can be generalized in a natural 
way, we have not proved that it has the required properties with respect to depth n — 1 strategies. 

Finally, for the example we have presented, we have found that although strategies making two 
non-adaptive channel evaluations cannot be perfect, they can be correct with high probability (about 
97.7%). What is the largest possible gap between optimal adaptive versus non-adaptive strategies 
making two (or any other number of) channel evaluations? The only upper-bound we have on this 
gap is that channels Oo and that are perfectly discriminated by two (adaptive or non-adaptive) 
evaluations must satisfy || <£>o — || > 1, and can therefore be discriminated (with a single evalua- 
tion) with probability at least 3/4 of correctness. 
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